A Fast Greedy Algorithm for Outlier Mining

نویسندگان

  • Zengyou He
  • Shengchun Deng
  • Xiaofei Xu
  • Joshua Zhexue Huang
چکیده

The task of outlier detection is to find small groups of data objects that are exceptional when compared with rest large amount of data. Recently, the problem of outlier detection in categorical data is defined as an optimization problem and a local-search heuristic based algorithm (LSA) is presented. However, as is the case with most iterative type algorithms, the LSA algorithm is still very time-consuming on very large datasets. In this paper, we present a very fast greedy algorithm for mining outliers under the same optimization model. Experimental results on real datasets and large synthetic datasets show that: (1) Our new algorithm has comparable performance with respect to those state-of-the-art outlier detection algorithms on identifying true outliers and (2) Our algorithm can be an order of magnitude faster than LSA algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing Outlier Detection using Greedy Based Information Theoretic Algorithms and its Comparison with PSO and ACO Optimization Techniques

Outlier is defined as an observation that deviates too much from other observations. The identification of outliers can lead to the discovery of useful and meaningful knowledge. Outlier detection has been extensively studied in the past decades. However, most existing research focuses on the algorithm based on special background, compared with outlier detection approach is still rare. Most soph...

متن کامل

CURIO: A Fast Outlier and Outlier Cluster Detection Algorithm for Large Datasets

Outlier (or anomaly) detection is an important problem for many domains, including fraud detection, risk analysis, network intrusion and medical diagnosis, and the discovery of significant outliers is becoming an integral aspect of data mining. This paper presents CURIO, a novel algorithm that uses quantisation and implied distance metrics to provide a fast algorithm that is linear for the numb...

متن کامل

Outlier Detection in Survival Analysis

Outlier detection is an important task in many data-mining applications. In this paper, we present two parametric outlier detection methods for survival data. Both methods propose to perform outlier detection in a multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness of fit. The first method is a single-step procedure that presents a de...

متن کامل

Efficiently Mining Regional Outliers in Spatial Data

With the increasing availability of spatial data in many applications, spatial clustering and outlier detection has received a lot of attention in the database and data mining community. As a very prominent method, the spatial scan statistic finds a region that deviates (most) significantly from the entire dataset. In this paper, we introduce the novel problem of mining regional outliers in spa...

متن کامل

A Study of Clustering Based Algorithm for Outlier Detection in Data streams

Recently many researchers have focused on mining data streams and they proposed many techniquesand algorithms for data streams. It refers to the process of extracting knowledge from nonstop fast growing data records. They are data stream classification, data stream clustering, and data stream frequentpattern items and so on. Data stream clustering techniques are highly helpful to cluster the si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006